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Abstract 


We  present  a  sequential  factorization  method  for  recovering  the  three-dimensional  shape  of  an  object  and 
the  motion  of  the  camera  from  a  sequence  of  images,  using  tracked  features.  The  factorization  method 
originally  proposed  by  Tomasi  and  Kanade  produces  robust  and  accurate  results  incoiporating  the 
singular  value  decomposition.  However,  it  is  still  difficult  to  ^ly  the  metlK)d  to  real-time  !y>plications 
since  it  is  based  on  a  batch-type  operation  and  the  cost  of  the  sin^ar  value  decomposition  is  laige.  We 
develop  the  factorization  method  into  a  sequential  method  by  regarding  the  feature  positions  as  a  vector 
time  series.  The  new  method  produces  estimates  of  shape  and  motion  at  each  frame.  The  singular  value 
decomposition  is  replaced  with  an  updating  computation  of  only  three  dominant  eigenvectors,  which  can 
be  performed  in  time,  while  the  complete  singular  value  decomposition  requires  operations  for  a  matrix. 
Also,  the  method  is  able  to  handle  infinite  sequences  since  it  does  not  store  any  increasingly  large 
matrices.  Experiments  using  synthetic  and  real  images  illustrate  that  the  method  has  nearly  the  same 
accuracy  and  robustness  as  the  original  method. 

This  research  is  ^nsored  by  the  Departmoit  of  tlw  Army,  Army  Researx^  Office  under  Grant  No. 
DAAH04-94-G-000O.  Hie  views  and  conclusions  contained  in  this  document  are  diose  of  the  authors  and 
should  not  be  interpreted  as  represoiting  die  official  policies,  either  expressed  or  implied,  of  die  DOA  or 
the  U.S.  Government 
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1.  Introduction 


Recovering  both  the  3D  shape  of  an  object  and  the  motion  of  the  camera  simultaneously 
from  a  stream  of  images  is  an  important  task  and  has  wide  applicability  in  many  tasks  such 
as  navigation  and  robot  manipulation.  Tomasi  and  Kanade[l  j  first  developed  a  factorization 
method  to  recover  shape  and  motion  under  an  orthographic  projection  model,  and  obtained 
robust  and  accurate  results.  Poelman  and  Kanade[2]  have  extended  the  factorization  method 
to  scaled-orthographic  projection  and  paraperspective  projection.  This  method  closely 
approximates  perspective  projection  in  most  practical  situations  so  that  it  can  deal  with 
image  sequences  which  contain  perspective  distortions. 

Although  the  factorization  method  is  a  useful  technique,  its  applicability  is  so  far  limited  to 
off-line  computations  for  the  following  reasons.  First,  the  method  is  based  on  a  batch-type 
computation;  that  is,  it  recovers  shape  and  motion  after  all  the  input  images  are  given.  Sec¬ 
ond,  the  singular  value  decomposition,  which  is  the  most  important  procedure  in  the 
method,  requires  O  (FP^)  operations  for  P  features  in  F  frames.  Finally,  it  needs  to  store  a 
large  measurement  matrix  whose  size  increases  with  the  number  of  frames.  These  draw¬ 
backs  make  it  difficult  to  apply  the  factorization  method  to  real-time  applications. 

This  report  presents  a  sequential  factorization  method  that  considers  the  input  to  the  system 
as  a  vector  time  series  of  feature  positions.  The  method  produces  estimates  of  shape  and 
motion  at  each  input  frame.  A  covariance-like  matrix  is  stored  instead  of  feature  positions, 
and  its  size  remains  constant  as  the  number  of  frames  increases.  The  singular  value  decom¬ 
position  is  replaced  with  a  computation,  updating  only  three  dominant  eigenvectors,  which 
can  be  performed  in  O  (P^)  time.  Consequently,  the  method  becomes  recursive. 

We  first  briefly  review  the  factorization  method  by  Tomasi  and  Kanade.  We  then  present  our 
sequential  factorization  method  in  Section  3.  The  algorithm's  performance  is  tested  using 
synthetic  data  and  real  images  in  Section  4. 
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2.  Theory  of  the  Factorization  Method:  Review 


2.1  Formalization 

The  input  to  the  factorization  method  is  a  measurement  matrix  W,  representing  image  posi¬ 
tions  of  tracked  features  over  multiple  frames.  Assuming  that  there  are  P  features  over  F 
frames,  and  letting  be  the  image  position  of  feature  p  at  frame  /,  W  is  nlFxP 

matrix  such  that 


•*11  •••  ^ip 

•  • 

•  • 

•*F1  •••  *F/* 

>11  -  yip 


(1) 


jyFi  —  ypp] 

Each  column  of  W  contains  all  the  observations  for  a  single  point,  while  each  row  contains 
all  the  observed  x-coordinates  or  y-coordinates  for  a  single  frame. 


Suppose  that  the  camera  orientation  at  frame  /  is  represented  by  orthonormal  vectors  jp 
and  kp  where  fy  corresponds  to  the  x-axis  of  the  image  plane  and  to  the  y-axis.  The  vec¬ 
tors  if  and  jf  are  collected  over  F  frames  into  a  motion  matrix  M  e  such  diat 


t 


(2) 


Let  be  the  location  of  feature  p  in  a  fixed  world  coordinate  system,  whose  origin  is  set  at 
the  center-of-mass  of  all  the  feature  points.  These  vectors  are  then  collected  into  a  shape 
matrix  Se  such  that 


S  -  [sj  ... 

Note  that 


(3) 


4 


(4) 


pml 

>\^th  this  notation,  the  following  equation  holds  by  assuming  an  orthographic  projection. 


W  =  MS  (5) 

Tomasi  and  Kanade[l]  pointed  out  the  simple  fact  that  the  rank  of  is  at  most  3  since  it  is 
the  product  of  the  2Fx  3  motion  matrix  M  and  the  3  x  P  shiq)e  matrix  S.  Based  on  this 
rank  theory,  they  developed  a  factorization  method  that  robustly  recovers  the  matrices  M 
and  5  from  W. 

2.2  Subspace  Computation 

The  actual  procedure  of  the  factorization  method  consists  of  two  steps.  First,  the  measure¬ 
ment  matrix  is  factorized  into  two  matrices  of  rank  3  using  the  singular  value  decoti^si- 
tion.  Assume,  without  loss  of  generality,  that  2F^P.  By  computing  the  singular  value 

decomposition  of  we  can  obtain  orthogonal  matrices  and 

Ve  such  that 


W  =  UIV^,  (6) 

where  £  =  diagCOj,  02,  and  Cj  ^  02  ^  >  0.  In  reality,  the  rank  of  W  is  not  exactly  3, 

but  approximately  3.  U  is  made  from  the  first  three  columns  of  the  left  singular  matrix  of  W. 
Likewise,  £  consists  of  the  first  three  singular  values  and  V  is  made  from  the  first  three  rows 
of  the  right  singular  matrix.  By  setting 

^  =  C/  and  5  =  £1^  (7) 

we  can  factorize  W  into 

W  =  (8) 

where  the  product  is  the  best  possible  rank  three  approximation  to  W. 

It  is  well  known  that  the  left  singular  vectors  U  span  the  column  space  of  W  while  the  right 
singular  vectors  V  span  its  row  space.  The  span  of  (/,  namely  motion  space,  determines  the 
motion,  and  the  span  of  V,  namely  shape  space,  determines  the  shape.  The  rank  theory 
claims  that  the  dimension  of  each  subspace  is  at  most  three,  and  the  first  step  of  the  factor¬ 
ization  method  finds  those  subspaces  in  the  high  dimensional  input  spaces.  Both  spaces  are 
said  to  be  dual  in  the  sense  that  one  of  them  can  be  computed  ftom  the  otho:.  This  observa¬ 
tion  helps  us  to  further  develop  the  sequential  factorization  method. 
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23  Metric  TVansformation 


The  decomposition  of  equation  (8)  is  not  completely  unique:  it  is  unique  only  up  to  an  affine 
transformation.  The  second  step  of  the  method  is  necessary  to  find  a  3  x  3  non-singular 
matrix  A ,  which  transforms  JQT  and  ^  into  the  true  solutions  M  and  5  as  follows. 


M  =  fyA  (9) 

S  =  A~'S  (10) 

Observing  that  rows  if  and  if  of  M  must  satisfy  the  normalization  constraints, 

landijjy=0.  (11) 

we  obtain  the  system  of  3F  overdetermined  equations  such  that 

ifLif=  1 

ffLjf=l  02) 

ffL]f=Q 

where  L  e  ^  is  a  symmetric  matrix 

L  =  A^A  (13) 

and,  if  and  ^  are  the  rows  of  X^.  By  denoting  ij  =  [i^j,  i^] ,  ij  =  [jfiJfiyjfl] .  and 


/j  12 

=  ^2  ^4  ^5  ’ 

h  ^5  k 

the  system  (12)  can  be  rewritten  as 

(14) 

Gl  =  c, 

where  G  €  1  e  R^,  and  c  e  R^^  are  defined  by 

(15) 

6 


and 


G  = 


*’■«!.  il) 


fUiJO 


t^UpJF) 

g^dvJi) 


.  /  = 


,e  = 


2F 


(16) 


{ap  bp  -  \afP>fy  2a^dy2  ^/2^/2 

The  simplest  solution  of  the  system  is  given  by  the  pseudo-inverse  method  such  that 


7  =  (G^G)'*G^c.  (18) 

The  vector  7  determines  the  symmetric  matrix  L,  whose  eigendecomposition  gives  A.  As  a 
result,  the  motion  M  and  the  shape  S  are  derived  according  to  equations  (9)  and  (10). 

The  matrix  A  is  an  affine  transform  which  transforms  iGt  into  Af  in  the  motion  space,  while 
the  matrix  A~^  transforms  S  into  5  in  the  shape  space.  Obtaining  this  transform  is  the  main 
purpose  of  the  second  step  of  the  factorization  method,  which  we  call  metric  tranrformation. 
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3.  A  Sequential  Factorization  Method 


3.1  Overview 

In  the  original  foctorization  method,  there  was  one  measuieinent  matrix  W  containing 
tracked  feature  positions  throughout  the  image  sequence.  After  all  the  input  images  are 
given  and  the  feature  positions  are  collected  into  tlw  matrix  the  motion  and  sluqw  are 
then  computed.  In  real-time  applications,  however,  it  is  not  feasible  to  use  this  batch-type 
scheme.  It  is  more  desirable  to  obtain  an  estimate  at  each  moment  sequentially.  The  input  to 
the  system  must  be  viewed  as  a  vector  time  series.  At  frame  /,  two  vectors  containing  fea¬ 
ture  positions  such  that 

xj  =  [x^,x^,  ...,Xyp]  andyj  =  . y^p]  (19) 

are  given.  Immediately  after  receiving  these  vectors,  the  system  must  compute  the  estimates 
of  the  camera  coordinates  ip  jy  and  the  shape  5^  at  that  frame.  At  the  next  frame,  new  sam¬ 
ples  x^^  I  and  y^^  j  arrive  and  new  camera  coordinates  |  and  j  are  to  be  computed  as 
well  as  an  updated  shape  estimate  ^ . 

The  key  to  developing  such  a  sequential  method  is  to  observe  that  the  shape  does  not  change 
over  time.  The  shape  space  is  stationary,  and  thus,  it  should  be  possible  to  derive  from 
I  without  performing  expensive  computations. 

More  specifically,  we  store  the  feature  vectors  x^  and  jy  in  a  covariance-type  matrix 
ZyS  defined  recursively  by 

Zf  =  Zf_i+  XjxJ +y^J.  (20) 

As  shown  later,  the  rank  of  is  at  most  three  and  its  three  dominant  eigenvectors  span 
the  shape  space.  Once  Qj  is  obtained,  the  camera  coordinates  at  frame  /  can  be  computed 
simply  by  multiplying  the  feature  vectors  and  the  eigenvectors  as  follows. 

ij  =  xJC/.  ft  =  jjQf  (21) 

This  framewoiic  makes  it  possible  to  estimate  camera  coordinates  immediately  after  receiv¬ 
ing  feature  vectors  at  each  frame.  All  information  obtained  up  to  the  frame  is  accumulated  in 
Qf  and  used  to  produce  the  estimates  at  that  frame. 

In  equation  (20),  the  size  of  is  fixed  to  P  x  P,  which  only  depends  on  the  number  of  fea¬ 
ture  points.  Therefore,  the  algorithm  does  not  need  to  store  any  matrices  whose  sizes 
increase  over  time. 

The  computational  effort  in  the  original  factorization  method  is  dominated  by  the  cost  of  the 
singular  value  decomposition.  In  the  framework  above,  we  need  to  compute  eigenvectors  of 
Zp  Note  that,  however,  we  only  need  the  first  three  dominant  eigenvectors.  Fortunately,  sev¬ 
eral  methods  exist  to  compute  only  the  dominant  eigenvectors  with  much  less  computation 
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necessary  to  compute  all  the  eigenvectors.  Before  describing  the  details  of  our  algorithm,  we 
briefly  review  these  techniques  in  the  following  section. 

3.2  Iterative  Eigenvector  Computation 

Among  the  existing  methods  which  can  compute  dominant  eigenvectors  of  a  square  matrix, 
we  introduce  two  methods,  the  power  method  and  orthogonal  iteration[3].  The  power 
method  is  the  simplest,  which  computes  the  most  dominant  eigenvector,  i.e.,  an  eigenvector 
associated  with  the  largest  eigenvalue.  It  provides  the  starting  point  for  most  other  tech¬ 
niques,  and  is  easy  to  understand.  The  method  of  orthogonal  iteration,  which  we  adopt  in 
our  method,  is  able  to  compute  several  dominant  eigenvectors. 

3.2.1  Power  Method 

Assume  that  we  want  to  compute  the  most  dominant  eigenvectors  of  an  n  x  n  matrix  B. 
Given  a  unit  2-norm  the  power  method  iteratively  computes  a  sequence  of  vec¬ 

tors  q  : 

for  k  =  1, 2, ... 


end 

The  second  step  of  the  iteration  is  simply  a  normalization  that  prevents  q  from  becoming 
veiy  large  or  very  small.  The  vectors  q  generated  by  the  iteration  converge  to  the  most 
dominant  eigenvector  of  B.  To  examine  the  convergence  property  of  the  power  method,  sup¬ 
pose  that  B  is  diagonalizable.  That  is,  X~^BX  =  diag(Xj,  with  an  orthogonal 

matrix  X  =  [x^,  ...,xj  ,  and  |Xj|  >  [Xj]  ^  •••  ^  If 

q^^^  =  biXi+b^2+  (22) 

and  b^  9^  0,  then  it  follows  that 


where  c  is  a  constant.  Since  |Xj|  >  |X2|  ^ ...  ^  |X^|,  equation  (23)  shows  that  the  vectors 
point  more  and  more  accurately  toward  the  du^ction  of  tli«  dominant  eigenvector  Xj, 
and  the  convergence  factor  is  the  ratio  r  =  |X2/Xj| . 
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3.2.2  Orthogonal  Iteration 


A  straightforward  generalization  of  the  power  method  can  be  used  to  compute  several  dom¬ 
inant  eigenvectors  of  a  symmetric  matrix.  Assuiitt  that  we  want  to  compute  p  dominant 
eigenvectors  of  a  symmetric  matrix  where  \^p^n.  Starting  with  an  nxp 

matrix  Qq  with  orthonormal  columns,  the  method  of  orthogonal  iteration  generates  a 
sequence  of  matrices  Qt  ^ 

fork  =  1,2, ... 

=  bq,., 

=  y*  (QR  factorization) 


end 

The  second  step  of  the  above  iteration  is  the  QR  factorization  of  where  G*  Is  an  orthog¬ 
onal  matrix  and  is  an  upper  triangular  matrix.  The  QR  factorization  can  achieved  by 
the  Gram-Schmidt  process.  This  step  is  viewed  as  a  normalization  process  that  is  similar  to 
the  normalization  used  in  the  power  method. 

Suppose  that  X^BX  =  diag  (Xj, ...,  is  the  eigendecomposition  of  B  with  an  orthogo¬ 
nal  matrix  X  -  [jcj,  ...,jcJ  ,  and  |Xj|  >  [Xj]  ^  ...  ^  jXJ.  It  has  been  shown  in  [3]  that  the 

subspace  range  generated  by  the  iteration  converges  to  span  {Xj . x^}  at  a  rate 

proportional  to  [X^^  j/X^j ,  i.e.. 


dist  (range  (Q^) ,  range  (Xp  ) 


(24) 


where  =  [x  j, . . .,  x^]  and  c  is  a  constant.  The  function  dist  represents  the  subspace  dis¬ 
tance  dehned  by 


dist  (range  (Qp ,  range  (Xp )  =  || QkQl-XpXH^  (25) 

The  method  offers  an  attractive  alternative  to  the  singular  value  decomposition  in  situations 
where  B  is  a  large  matrix  and  a  few  of  its  largest  eigenvalues  are  needed.  In  our  case,  these 
conditions  clearly  hold.  In  addition,  the  rank  theory  of  the  factorization  methodfl]  guaran¬ 
tees  that  the  ratio  iX^/X^j  is  very  small,  and  as  a  result,  we  should  achieve  fast  convergence 
for  computing  the  first  three  eigenvectors. 

3.3  Sequential  Factorization  Algorithm 

As  in  the  original  method,  the  sequential  factorization  method  consists  of  two  steps,  sequen¬ 
tial  shape  space  computation  and  sequential  metric  transformation. 
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33.1  A  Sequential  Shape  Space  Computation 

In  the  sequential  factorization  method,  we  consider  the  feature  vectors,  xj  and  yj,  as  a  vec¬ 
tor  time  series.  Let  us  denote  the  measurement  matrix  in  the  original  factorization  method  at 
frame  /  by  Wy.  Then,  it  grows  in  the  following  manner: 


W,  = 


71 

y\ 


yf 


w,= 


7l 

yi 


Now  let  us  define  a  matrix  time  series  Z^e  by 

From  the  definition,  it  follows  that 


(26) 


(27) 


Zy  =  WjWy.  (28) 

Since  the  rank  of  Wy  is  at  most  three,  the  rank  of  Zy  is  also  at  most  three.  If 

Wy  =  U^yVj  (29) 

is  the  singular  value  decomposition  of  Wy,  where  UyE  VyE  R^^^  art  orthogonal 

matrices,  and  Zy  =  diag  (o^  j,  Cfy  j.  3) » then 

Zy=  {VjLyVjfufiyVj  =  VjLjvJ,  (30) 

This  means  the  eigenvectors  of  ^  are  equivalent  to  the  right  singular  vectors  Vy  of  Wy. 
Hence,  it  is  possible  to  obtain  the  shape  space  by  computing  the  eigenvectors  of  Zy. 

To  compute  Vy,  we  combine  orthogonal  iteration  with  updating  by  equation  (27).  Given  a 
P  X  3  matrix  Qq  with  orthonormal  columns  and  a  null  matrix  Zq  e  R^^^,  the  following 
algorithm  generates  a  sequence  of  matrices  QyE  R^^^: 

[Algorithm  (l)]/or/=  1,  2, .... 

(1)  Zy  =  Zy_  1  +XyXj  +y^J 
i2)Y=ZyQy_^ 

(3)  QyR  =  Y  (QR  factorization) 


II 


end 


The  index  /  corresponds  to  the  frame  number  and  each  iteration  is  performed  frame  by 
frame.  The  matrix  generated  by  the  algorithm  is  expected  to  converge  to  the  eigenvectors 
of  Zy.  While  the  original  orthogonal  iteration  works  with  a  fixed  matrix,  the  above  algo¬ 
rithm  works  with  the  matrix  which  varies  finom  iteration  to  iteration  incorporating  new 
features.  In  other  words,  the  sequential  factorization  method  folds  the  update  of  into  the 
orthogonal  iteration.  If  the  range  (Vp  randomly  changes  over  time,  no  convergence  is 
expected  to  appear.  However,  it  can  be  shown  that 

range  (V^)  =  range  (Wj)  =  range  (S^)  ,  for  all/.  (31) 

Therefore,  range  is  stationary  and  ratige(Gy)  converges  to  range  (Vy)  as  in  the 
orthogonal  iteration.  Even  when  noise  exists,  if  the  noise  is  uncorrelated  or  the  noise  space 
is  orthogonal  to  the  signal  space  range  ( Vp ,  then  range  (Vp  is  still  equal  to  range  (5^) 
and  the  convergence  can  be  shown.  The  following  convergence  rate  of  the  algorithm  is 
deduced  from  the  convergence  rate  of  the  orthogonal  iteration. 


dist (range (Qp, range (Vy)) 


k=i 


'k.4 


(32) 


3.3.2  Stationary  Basis  for  the  Shape  Space 

Algorithm  (1)  presented  in  the  previous  section  produces  the  matrix  which  converges  to 
the  matrix  Vy  that  spans  the  shape  space.  The  true  shape  and  motion  are  determined  from 
the  shape  space  by  a  metric  transformation.  It  is  not  straightforward  at  this  point,  however, 
to  apply  the  metric  transformation  sequentially.  The  problem  is  that,  even  though 
range  (Vy)  is  stationary,  the  matrix  Vy  itself  changes  as  the  number  of  frames  increases. 
This  is  due  to  the  nature  of  singular  vectors.  They  are  the  basis  for  the  row  and  column  sub¬ 
spaces  of  a  matrix,  and  the  singular  value  decomposition  chooses  them  in  a  special  way. 
They  are  more  than  just  orthonormal.  As  a  result,  they  rotate  in  the  3D  subspace 
range  ( Vy) .  Recall  that  the  matrix  A  obtained  in  metric  transformation  (9)  is  a  transform 
from  Xfy  (or  UJ)  to  A/y  in  the  subspace  range  (ifiTy) .  Since  Vy  changes  at  each  frame,  C/y 
also  changes.  (Jonsequently,  the  matrix  A  also  changes  frame  by  frame. 

For  clarity,  let  us  denote  an  A  matrix  at  frame  /  as  Ay.  The  fact  that  Ay  changes  at  each 
frame  makes  it  difficult  to  estimate  Ay  iteratively  and  efficiently.  Thus  we  need  to  add  an 
additional  process  to  obtain  stationary  oasis  for  the  shape  space  to  update  matrix  Ay. 

Let  us  define  a  projection  matrix  flye  onto  the  range  (Qy)  by 

Hf  =  GyC/.  (33) 

where  Qj  is  the  output  from  Algorithm  (1).  Needless  to  say,  the  rank  of  is  at  most  three. 
Since  range  (0/)  (=range  (A/y) )  is  stationary,  the  projection  matrix  must  be  stationary. 
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It  is  thus  possible  to  obtain  the  stationary  basis  for  the  shape  space  by  replacing  with  the 
eigenvectors  of 

An  iterative  method  similar  to  Algorithm  (1)  can  be  used  to  reduce  the  amount  of  computa¬ 
tion.  Given  a  3  matrix  Qq  with  oithonormal  colunuis,  the  iterative  method  below  gener¬ 
ates  a  matrix  Q^e  which  provides  the  stationary  basis  for  the  shape  space. 

[Algorithm  (2)]/or  f  =  1. 2, .... 

H,  =  QfiJ 

QjR  =  Y  (QR  factorization) 


end 


3.33  Sequential  Metric  lyansformation 

In  the  previous  section,  we  derived  the  shape  space  in  terms  of  Q^.  Once  is  obtained,  it  is 
possible  to  compute  camera  coordinates  if  and  Jf  as 

V  =  XfQp  ff  =  yjQf  (34) 

These  coordinates  are  used  to  solve  the  overdetermined  equations  (12)  and  the  true  camera 
coordinates  are  recovered  in  the  same  way  as  in  the  original  method.  Doing  so,  however, 
requires  storing  all  the  coordinates  if  and  Jf,  the  number  of  which  may  be  very  large. 
Instead,  we  use  the  following  sequential  algorithm. 


[Algorithm  (3)]/or/=  1,  2 . 

'iT  J-pr  -ppr 

</  =  >c}Q,.  If  =  ffQf 


Df  =  Df.  I  +g  (Jyv  lf)gY(^f  if)  +g  0fjf)g^0f}f)  +g(tfjf)g'^(tfjf) 


Ef=  Ey_i*g(}f,  if)  +  g  Of,}/) 


end 

Let  Gf  and  Cj  be  the  matrices  G  and  c  at  frame  /,  where  G  and  c  are  defined  in  Section  2.3 
From  the  definition,  it  follows  that 


GjCy 


(35) 


Assigning  equations  (35)  and  (36)  to  equation  (18),  we  have 


(36) 
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If  =  d;*£,  (37) 

which  gives  the  symmetric  matrix  Lf.  The  eigendecomposition  of  Lj  yields  the  affine  trans¬ 
form  Af  and,  as  a  result,  the  camera  coordinates  and  the  shape  are  ootained  as  follows: 

T*  ^7*  T* 

«/=*/-*/  Jf=SfA,  (38) 

S,  =  A}'Q,  (39) 

Algorithm  (3)  followed  by  equations  (37),  (38),  and  (39)  completes  the  sequential  method. 
The  size  of  matrices  Df  and  Ef  are  fixed  to  6  x  6  and  6x1,  and  the  method  does  not  store 
any  matrices  that  grow,  even  in  the  sequential  metric  transformation. 
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4.  Experiments 


4.1  Synthetic  Data 

In  this  section  we  compare  the  accuracy  of  our  sequential  factorization  method  with  that  of 
the  original  factorization  method.  Since  both  methods  are  essentially  based  on  the  rank  the¬ 
ory,  we  do  not  expect  any  difference  in  the  results.  Our  purpose  here  is  to  confirm  that  the 
sequential  method  has  tite  same  accuracy  of  sh2y>e  and  mc^on  recovery  as  the  original 
method. 

4.1.1  Data  Generation 

The  object  in  this  experiment  consists  of  100  random  feature  points.  The  sequences  are  cre¬ 
ated  using  a  perspective  projection  of  those  points.  The  image  coordinates  of  each  point  are 
perturbed  by  adding  Gaussian  noise,  which  we  assume  to  simulate  tracking  error  and  image 
noise.  The  standard  deviation  of  the  Gaussian  noise  is  set  to  two  pixels  of  a  512  x  512  pixel 
image.  The  distance  of  the  object  center  from  the  camera  is  fixed  to  ten  times  the  object  size. 
The  focal  length  is  chosen  so  that  the  projection  of  the  object  covers  the  whole  512  x  512 
image.  The  camera  is  rotated  as  shown  in  Figure  1,  while  the  object  is  translated  to  keep  its 
image  at  the  image  center.  Quantization  errors  are  no^  added  since  we  assume  that  we  are 
able  to  track  features  with  a  subpixel  resolution. 


0  20  40  60  80  100  120  140 


Frame 

Figure  1  IVue  camera  motion 

The  camera  roll,  pitch,  and  yaw  are  varied  as  shown  in  this  figure.  The 
sequence  consists  of  ISO  fiat^. 
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When  discussing  the  accuracy  of  the  sequential  method,  one  needs  to  consider  its  dynamic 
property  regarding  the  3D  recovery.  The  accuracy  of  the  recovery  at  a  particular  frame  by 
the  sequential  method  depends  on  the  total  amount  of  motion  up  to  that  time,  since  the 
recovery  is  made  only  from  the  information  obtained  up  to  that  time.  At  the  beginning  of  an 
image  sequence,  for  example,  the  motion  is  generally  small,  so  high  accuracy  can  not  be 
expected.  The  accuracy  generally  improves  as  the  motion  becomes  larger.  The  original 
method  does  not  have  this  dynamic  property,  since  it  is  based  on  a  batch-type  scheme  and 
uses  all  the  information  throughout  the  sequence. 

In  order  to  compare  both  methods  under  the  same  conditions,  we  perform  the  following 
computations  beforehand.  First,  we  form  a  submatrix  which  only  contains  the  feature 
positions  up  to  frame  /.  The  original  factorization  is  applied  to  the  submatrix,  then  the 
results  are  kept  as  solutions  at  frame  /.  They  are  the  best  estimates  given  e  original 

method.  Repeating  this  process  for  each  frame,  we  derive  the  best  estimates,  vhich  our 

results  are  compared. 

4.1.2  Accuracy  of  the  Sequential  Shape  Space  Computation 

We  first  discuss  the  convergence  property  of  the  sequential  shape  space  computation.  The 
sequential  factorization  method  starts  with  Algorithm  (1)  in  Section  3.3.1,  iteratively  ener- 
ating  the  matrix  Qj:  which  is  an  estimate  for  the  true  shape  space  S^.  Let  us  represent  the 
estimation  error  with  respect  to  the  true  shape  space  by 

=  dist  (span  ((2y),  span  (S^))  (40) 

Recall  that  the  function  dist  provides  a  notion  of  difference  between  two  spaces.  On  the 
other  hand,  the  original  method  produces  the  best  estimate  for  the  shape  space  by  comput¬ 
ing  the  right  singular  vectors  of  the  submatrix  and  its  error  with  respect  to  the  true 
shape  space  is  also  represented  by 

Eg  =  dist  (span  (Vy),  span  (S^))  (41) 

Comparing  both  errors.  Figure  2  shows  that  they  are  almost  identical.  That  is,  the  errors 
given  by  the  sequential  method  are  almost  equal  to  those  given  by  the  original  method. 

At  the  beginning  of  the  sequence,  the  amount  of  motion  is  small  and  both  errors  are  rela¬ 
tively  large.  The  ratio  of  the  4th  to  3rd  singular  values,  shown  in  Figure  3,  also  indicates  that 
it  is  difficult  to  achieve  good  accuracy  at  the  beginning.  Both  errors,  however,  quickly 
become  smaller  as  the  camera  motion  becomes  larger.  After  about  the  2()th  frame,  constant 
errors  of  3  x  10”^  are  observed  in  this  experiment. 

The  solutions  given  by  the  two  methods  are  so  close  that  the  graphs  are  completely  over¬ 
lapped.  Thus,  we  also  plot  their  difference  defined  by 

A£  =  dist  (span  (Qy),  span  (Vy))  (42) 

in  Figure  4.  Although  A£  is  relatively  large  at  the  beginning,  it  quickly  becomes 
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Ratio 


Figure!  Shape  space  errors 

Sh^  space  esdmation  errors  by  the  sequential  method  (solid  line)  and  the 
original  method  (dashed  line)  with  respect  to  the  true  shape  space.  The 
errors  are  defined  by  subspace  distance  and  plotted  logarithmically. 


Figures  Singular vaiue ratio 

The  ratio  of  the  4th  to  3rd  singular  values,  that  is  o^/Oj. 
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Figure  4  Difference  of  shape  space  errors 
The  difference  of  the  estimates  by  the  sequential  and  original  mediods, 
versus  the  frame  number.  The  difference  is  plotted  logarithmically. 


very  small.  In  fact,  after  about  the  30th  frame,  A£  is  less  than  1  x  10"^,  while  £,  and  Eg 
are  both  3x  10*^. 

4.13  Accuracy  of  the  Motion  and  Shape  Recovery 

The  three  plots  of  Rgure  5  show  errors  in  roll,  pitch,  and  yaw  in  the  recovered  motion;  the 
solid  lines  correspond  to  the  sequential  method,  the  dotted  lines  to  the  original  method.  The 
difference  in  motion  errors  between  the  original  and  sequential  methods  is  quite  small. 

Both  results  are  unstable  for  a  short  period  at  the  beginning  of  the  sequence.  After  that,  they 
show  two  kinds  of  errors:  random  and  structural.  Random  errors  are  due  to  Gaussian  noise 
added  to  the  feature  positions.  Structural  errors  are  due  to  perspective  distortion,  and  relate 
to  the  motion  patterns.  The  structural  errors  show  a  negative  peak  at  about  the  60th  frame 
and  are  almost  constant  between  the  90th  and  120th  frames.  Note  the  pattern  corresponds  to 
the  motion  pattern  shown  in  Rgure  1. 

Of  course,  these  intrinsic  errors  cannot  be  eliminated  in  the  sequential  method.  The  point  to 
observe  is  that  the  differences  between  the  two  solutions  are  sufhciently  smaller  than  the 
intrinsic  errors. 

Shape  errors  which  are  compared  in  Rgure  6  also  indicate  the  same  results.  Again,  the  dif¬ 
ferences  between  the  two  methods  are  quite  small  compared  to  the  intrinsic  errors  which  the 
original  method  possesses.  Note  that  no  Gaussian  noise  appears  in  the  shtqpe  errors  since 
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they  are  averaged  over  all  the  feature  points. 

We  conclude  from  these  results  that  the  sequential  method  is  nearly  as  accurate  as  the  origi¬ 
nal  method  except  that  some  extra  frames  are  required  to  converge. 
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Figures  Motion  errors 

Errors  of  recovered  camera  roll  (top),  pitch  (middle),  and  yaw  (bottom).  The 
errors  given  Iqr  the  sequential  medrad  ate  plotted  with  solid  lines,  while  the  etron 
given  by  the  original  method  are  plotted  with  dotted  lines. 
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Figure  6  Shape  error 

This  figure  compares  the  shqw  eiion  given  by  the  two  method.  The  enon  given 
by  the  sequential  method  ate  plotted  with  solid  lines,  while  the  errors  given  the 
oti^nal  method  ate  plotted  with  dotted  lines.  The  errors  are  conq>uted  as  the  root* 
mean-square  errors  of  the  recovered  shape  with  respect  to  the  true  shape,  at  each 
frame. 
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4.2  Real  Images 


Experiments  were  performed  on  two  sets  of  real  images.  The  first  set  is  an  image  sequence 
of  a  satellite  rotating  in  space.  Another  experiment  uses  a  long  video  recording  (764  images) 
of  a  house  taken  with  a  hand-held  camera.  These  experiments  demonstrate  tlK  applicability 
of  the  sequential  factorization  method  in  real  situations.  In  both  experiments,  features  are 
selected  and  tracked  using  the  method  presented  by  Tomasi  and  Kanade[l]. 

4.2.1  Satellite  Images 

Figure  7  shows  an  image  of  the  satellite  with  selected  features  indicated  by  small  squares. 
The  image  sequence  was  digitized  from  a  video  tecording[4]  actually  taken  by  a  space  shut¬ 
tle  astronaut.  The  feature  tracker  automatically  selected  and  tracked  32  features  tluoughout 
the  sequence  of  101  images.  Of  these,  five  features  on  the  astronaut  maneuvering  around  the 
satellite  were  manually  eliminated  because  they  had  a  different  motion.  Thus,  the  remaining 
27  features  were  processed.  Hgure  8  shows  the  recovered  motion  in  terms  of  roll,  pitch,  and 
yaw.  The  side  view  of  the  recovered  shape  is  displayed  in  Hgure  9,  where  the  features  on  the 
solar  panel  are  marked  with  opaque  squares  and  others  with  filled  squares.  No  ground-truth 
is  available  for  the  shape  or  the  motion  in  this  experiment.  Yet,  it  appears  that  the  solutions 
are  satisfactory,  since  the  features  on  the  solar  panel  almost  lie  in  a  single  line  in  the  side 
view. 


4.2.2  House  Images 

Figure  10  shows  the  first  image  of  the  sequence  used  in  the  second  experiment.  Using  a 
hand-held  camera,  one  of  the  authors  took  this  sequence  while  walking.  It  consists  of  764 
images  which  correspond  to  about  2S  seconds.  The  feature  tracker  detected  and  tracked  62 
features.  The  recovered  motion  and  shape  are  shown  in  Figures  1 1  and  12.  It  is  clearly  seen 
that  the  shape  is  qualitatively  correct.  It  is  also  reasonable  to  observe  that  only  the  camera 
yaw  is  increasing,  because  the  camera  is  moving  parallel  to  the  ground.  In  addition,  note  that 
the  computed  roll  motion  reveals  the  pace  of  the  recorder’s  steps,  which  is  about  1  step  per 
second. 

Further  evaluation  of  accuracy  in  these  experiments  is  difficult.  However,  this  qualitative 
analysis  of  the  results  with  real  images,  and  quantitative  analysis  of  the  results  with  syn¬ 
thetic  data  essentially  shows  that  the  sequent!^  method  works  as  well  with  teal  images  as 
the  original  batch  method. 
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Figure  7  An  image  of  a  satellite 
The  first  frame  of  the  satellite  image  sequence.  The  superimposed  squares 
indicate  the  selected  features. 
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Recovered  camera  roll  (solid  line),  pitch  (dashed  line),  and  yaw  (dotted 
line)  for  the  satellite  image  sequence. 


Figure  9  Side  view  of  the  recovered  shape 
A  side  view  of  the  recovered  shape  of  the  satellite.  The  features  on  the 
solar  panel  are  shown  with  opaque  squares  and  others  with  filled  squares. 
Notice  that  the  features  on  the  solar  panel  cmectly  lie  in  a  single  plane. 


24 


Figure  10  An  image  of  a  house 
The  first  fhone  of  the  house  image  sequence.  The  superimposed  squares 
indicate  the  selected  features. 
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Figure  11  Recovered  motion  of  house 
Recovered  camera  roll  (solid  line),  pitch  (dashed  line),  and  yaw  (dotted 
tine)  for  the  house  image  sequence. 


Figure  12  Top  view  of  the  recovered  shape 
A  view  of  the  recovered  shape  of  the  house  from  above.  The  features  on 
the  two  side  walls  are  correcdy  recovered. 
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4.3  Computational  Time 


Finally,  we  compare  the  processing  time  of  the  sequential  method  with  the  original  method. 
The  computational  complexity  of  the  original  method  is  dominated  by  the  cost  of  the  singu¬ 
lar  value  decomposition,  which  needs  14FP^+  llP^/3  computations  for  a  2PxP  mea¬ 
surement  matrix  with  2FtP  [5],  Note  that  F  corresponds  to  the  number  of  frames  and  P  to 
the  number  of  features.  On  the  other  hand,  the  complexity  of  the  sequential  method  is 
22P^  +  44P  for  computing  dominant  eigenvectors,  plus  4P^  for  updating  the  Zmatrix. 
Computing  the  solution  for  frame  F,  therefore,  takes  only  O  (P^)  using  the  sequential 
method,  while  the  original  method  would  require  O  (FP^)  operations. 

Figure  13  shows  the  actual  processing  time  of  the  sequential  method  on  a  Sun4/10  compared 
together  with  that  of  the  original  method.  Hie  number  of  features  varied  from  10  to  500, 
while  the  number  of  frames  was  fixed  at  120.  The  processing  time  for  selecting  and  tracking 
features  was  not  included.  The  singular  value  decomposition  of  the  original  method  is  based 
on  a  routine  found  in  [6].  The  results  sufficiently  agree  with  our  analysis  above.  In  addition, 
when  the  number  of  features  is  less  than  40,  the  sequential  method  is  possible  to  run  within 
1/30  ms,  which  means  video-rate  processing  on  a  Sun4/10. 


Figure  13  Processing  time 

The  processing  time  of  the  sequential  method  on  a  Sun4/10  (solid  line) 
compared  with  that  of  the  original  method  (dotted  line),  as  a  function  of 
the  number  of  features  which  is  varied  from  10  to  500.  The  number  of 
frames  is  fixed  at  120. 
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5.  Conclusions 


We  have  presented  the  sequential  factorization  method,  which  provides  estimates  of  shape 
and  motion  at  each  frame  from  a  sequence  of  images.  The  method  produces  as  accurate  and 
robust  results  as  the  original  method,  while  significantly  reducing  the  computational  com¬ 
plexity.  The  reduction  in  complexity  is  important  for  applying  the  factorization  method  to 
real-time  applications.  Furthermore,  the  method  does  not  require  storing  any  growing  matri¬ 
ces  so  that  its  implementation  in  VLSI  or  DSP  is  feasible. 

Faster  convergence  in  the  shape  space  computation  could  be  achieved  using  more  sophisti¬ 
cated  algorithms  such  as  the  orthogonal  iteration  with  Ritz  acceleration[3]  instead  of  the 
basic  orthogonal  iteration.  Also,  it  is  possible  to  use  scaled  orthographic  projection  or  parq)- 
erspective  projection[2]  to  improve  the  accuracy  of  the  sequential  factorization  method. 
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